Overview

Peregrine is a map reduce framework designed for running iterative jobs across partitions of data. Peregrine is designed to be FAST for executing map reduce jobs by supporting a number of optimizations and features not present in other map reduce frameworks.

Status

The latest Peregrine beta release is 0.5.3. It is ready for production jobs but may not be as fast as the final version as we are still landing important performance optimizations. We plan on releasing 0.6.0 in January 2012 and 1.0 around March 2012. 0.5.x is not designed for extensive crash recovery but for production jobs on small clusters should be very fast. Crashes can be resolved by simply restarting the job. On smaller clusters this should be acceptable but cleary not on larger clusters (more than 100 machines).

Our goal is to be feature complete with the ability to execute across 10-40 nodes in the 0.5.0 timeframe and then work on handling failure in the 1.0 timeframe.

This will allow people to run Peregrine in production as soon as possible and see a return on their investment.

Features

Peregrine supports a number of optimizations and features not present in other map reduce frameworks including:

Performance

Modern design

Tight code base.

Design

Peregrine is designed primarily for iterative map reduce applications which need to join against the the previous iteration.

For example, algorithms like Pagerank and k-means are iterative and join against data from the previous iteration.

Peregrine has an implementation of Pagerank already which we're using as a test bed to prove out the rest of our framework.

Philosophy